{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Optional Lab - Regularized Cost and Gradient"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Goals\n",
    "In this lab, you will:\n",
    "- extend the previous linear and logistic cost functions with a regularization term.\n",
    "- rerun the previous example of over-fitting with a regularization term added.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "%matplotlib widget\n",
    "import matplotlib.pyplot as plt\n",
    "from plt_overfit import overfit_example, output\n",
    "from lab_utils_common import sigmoid\n",
    "np.set_printoptions(precision=8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Adding regularization\n",
    "<img align=\"Left\" src=\"./images/C1_W3_LinearGradientRegularized.png\"  style=\" width:400px; padding: 10px; \" >\n",
    "<img align=\"Center\" src=\"./images/C1_W3_LogisticGradientRegularized.png\"  style=\" width:400px; padding: 10px; \" >\n",
    "\n",
    "The slides above show the cost and gradient functions for both linear and logistic regression. Note:\n",
    "- Cost\n",
    "    - The cost functions differ significantly between linear and logistic regression, but adding regularization to the equations is the same.\n",
    "- Gradient\n",
    "    - The gradient functions for linear and logistic regression are very similar. They differ only in the implementation of $f_{wb}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cost functions with regularization\n",
    "### Cost function for regularized linear regression\n",
    "\n",
    "The equation for the cost function regularized linear regression is:\n",
    "$$J(\\mathbf{w},b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})^2  + \\frac{\\lambda}{2m}  \\sum_{j=0}^{n-1} w_j^2 \\tag{1}$$ \n",
    "where:\n",
    "$$ f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b  \\tag{2} $$ \n",
    "\n",
    "\n",
    "Compare this to the cost function without regularization (which you implemented in  a previous lab), which is of the form:\n",
    "\n",
    "$$J(\\mathbf{w},b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})^2 $$ \n",
    "\n",
    "The difference is the regularization term,  <span style=\"color:blue\">\n",
    "    $\\frac{\\lambda}{2m}  \\sum_{j=0}^{n-1} w_j^2$ </span> \n",
    "    \n",
    "Including this term incentives gradient descent to minimize the size of the parameters. Note, in this example, the parameter $b$ is not regularized. This is standard practice.\n",
    "\n",
    "Below is an implementation of equations (1) and (2). Note that this uses a *standard pattern for this course*,   a `for loop` over all `m` examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_cost_linear_reg(X, y, w, b, lambda_ = 1):\n",
    "    \"\"\"\n",
    "    Computes the cost over all examples\n",
    "    Args:\n",
    "      X (ndarray (m,n): Data, m examples with n features\n",
    "      y (ndarray (m,)): target values\n",
    "      w (ndarray (n,)): model parameters  \n",
    "      b (scalar)      : model parameter\n",
    "      lambda_ (scalar): Controls amount of regularization\n",
    "    Returns:\n",
    "      total_cost (scalar):  cost \n",
    "    \"\"\"\n",
    "\n",
    "    m  = X.shape[0]\n",
    "    n  = len(w)\n",
    "    cost = 0.\n",
    "    for i in range(m):\n",
    "        f_wb_i = np.dot(X[i], w) + b                                   #(n,)(n,)=scalar, see np.dot\n",
    "        cost = cost + (f_wb_i - y[i])**2                               #scalar             \n",
    "    cost = cost / (2 * m)                                              #scalar  \n",
    " \n",
    "    reg_cost = 0\n",
    "    for j in range(n):\n",
    "        reg_cost += (w[j]**2)                                          #scalar\n",
    "    reg_cost = (lambda_/(2*m)) * reg_cost                              #scalar\n",
    "    \n",
    "    total_cost = cost + reg_cost                                       #scalar\n",
    "    return total_cost                                                  #scalar"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the cell below to see it in action."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "np.random.seed(1)\n",
    "X_tmp = np.random.rand(5,6)\n",
    "y_tmp = np.array([0,1,0,1,0])\n",
    "w_tmp = np.random.rand(X_tmp.shape[1]).reshape(-1,)-0.5\n",
    "b_tmp = 0.5\n",
    "lambda_tmp = 0.7\n",
    "cost_tmp = compute_cost_linear_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)\n",
    "\n",
    "print(\"Regularized cost:\", cost_tmp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Expected Output**:\n",
    "<table>\n",
    "  <tr>\n",
    "    <td> <b>Regularized cost: </b> 0.07917239320214275 </td>\n",
    "  </tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cost function for regularized logistic regression\n",
    "For regularized **logistic** regression, the cost function is of the form\n",
    "$$J(\\mathbf{w},b) = \\frac{1}{m}  \\sum_{i=0}^{m-1} \\left[ -y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) \\right] + \\frac{\\lambda}{2m}  \\sum_{j=0}^{n-1} w_j^2 \\tag{3}$$\n",
    "where:\n",
    "$$ f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = sigmoid(\\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b)  \\tag{4} $$ \n",
    "\n",
    "Compare this to the cost function without regularization (which you implemented in  a previous lab):\n",
    "\n",
    "$$ J(\\mathbf{w},b) = \\frac{1}{m}\\sum_{i=0}^{m-1} \\left[ (-y^{(i)} \\log\\left(f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right) - \\left( 1 - y^{(i)}\\right) \\log \\left( 1 - f_{\\mathbf{w},b}\\left( \\mathbf{x}^{(i)} \\right) \\right)\\right] $$\n",
    "\n",
    "As was the case in linear regression above, the difference is the regularization term, which is    <span style=\"color:blue\">\n",
    "    $\\frac{\\lambda}{2m}  \\sum_{j=0}^{n-1} w_j^2$ </span> \n",
    "\n",
    "Including this term incentives gradient descent to minimize the size of the parameters. Note, in this example, the parameter $b$ is not regularized. This is standard practice. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_cost_logistic_reg(X, y, w, b, lambda_ = 1):\n",
    "    \"\"\"\n",
    "    Computes the cost over all examples\n",
    "    Args:\n",
    "    Args:\n",
    "      X (ndarray (m,n): Data, m examples with n features\n",
    "      y (ndarray (m,)): target values\n",
    "      w (ndarray (n,)): model parameters  \n",
    "      b (scalar)      : model parameter\n",
    "      lambda_ (scalar): Controls amount of regularization\n",
    "    Returns:\n",
    "      total_cost (scalar):  cost \n",
    "    \"\"\"\n",
    "\n",
    "    m,n  = X.shape\n",
    "    cost = 0.\n",
    "    for i in range(m):\n",
    "        z_i = np.dot(X[i], w) + b                                      #(n,)(n,)=scalar, see np.dot\n",
    "        f_wb_i = sigmoid(z_i)                                          #scalar\n",
    "        cost +=  -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)      #scalar\n",
    "             \n",
    "    cost = cost/m                                                      #scalar\n",
    "\n",
    "    reg_cost = 0\n",
    "    for j in range(n):\n",
    "        reg_cost += (w[j]**2)                                          #scalar\n",
    "    reg_cost = (lambda_/(2*m)) * reg_cost                              #scalar\n",
    "    \n",
    "    total_cost = cost + reg_cost                                       #scalar\n",
    "    return total_cost                                                  #scalar"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the cell below to see it in action."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "np.random.seed(1)\n",
    "X_tmp = np.random.rand(5,6)\n",
    "y_tmp = np.array([0,1,0,1,0])\n",
    "w_tmp = np.random.rand(X_tmp.shape[1]).reshape(-1,)-0.5\n",
    "b_tmp = 0.5\n",
    "lambda_tmp = 0.7\n",
    "cost_tmp = compute_cost_logistic_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)\n",
    "\n",
    "print(\"Regularized cost:\", cost_tmp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Expected Output**:\n",
    "<table>\n",
    "  <tr>\n",
    "    <td> <b>Regularized cost: </b> 0.6850849138741673 </td>\n",
    "  </tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gradient descent with regularization\n",
    "The basic algorithm for running gradient descent does not change with regularization, it is:\n",
    "$$\\begin{align*}\n",
    "&\\text{repeat until convergence:} \\; \\lbrace \\\\\n",
    "&  \\; \\; \\;w_j = w_j -  \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1}  \\; & \\text{for j := 0..n-1} \\\\ \n",
    "&  \\; \\; \\;  \\; \\;b = b -  \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\\\\n",
    "&\\rbrace\n",
    "\\end{align*}$$\n",
    "Where each iteration performs simultaneous updates on $w_j$ for all $j$.\n",
    "\n",
    "What changes with regularization is computing the gradients."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Computing the Gradient with regularization (both linear/logistic)\n",
    "The gradient calculation for both linear and logistic regression are nearly identical, differing only in computation of $f_{\\mathbf{w}b}$.\n",
    "$$\\begin{align*}\n",
    "\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}  &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)}  +  \\frac{\\lambda}{m} w_j \\tag{2} \\\\\n",
    "\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}  &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{3} \n",
    "\\end{align*}$$\n",
    "\n",
    "* m is the number of training examples in the data set      \n",
    "* $f_{\\mathbf{w},b}(x^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target\n",
    "\n",
    "      \n",
    "* For a  <span style=\"color:blue\"> **linear** </span> regression model  \n",
    "    $f_{\\mathbf{w},b}(x) = \\mathbf{w} \\cdot \\mathbf{x} + b$  \n",
    "* For a <span style=\"color:blue\"> **logistic** </span> regression model  \n",
    "    $z = \\mathbf{w} \\cdot \\mathbf{x} + b$  \n",
    "    $f_{\\mathbf{w},b}(x) = g(z)$  \n",
    "    where $g(z)$ is the sigmoid function:  \n",
    "    $g(z) = \\frac{1}{1+e^{-z}}$   \n",
    "    \n",
    "The term which adds regularization is  the <span style=\"color:blue\">$\\frac{\\lambda}{m} w_j $</span>."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Gradient function for regularized linear regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_gradient_linear_reg(X, y, w, b, lambda_): \n",
    "    \"\"\"\n",
    "    Computes the gradient for linear regression \n",
    "    Args:\n",
    "      X (ndarray (m,n): Data, m examples with n features\n",
    "      y (ndarray (m,)): target values\n",
    "      w (ndarray (n,)): model parameters  \n",
    "      b (scalar)      : model parameter\n",
    "      lambda_ (scalar): Controls amount of regularization\n",
    "      \n",
    "    Returns:\n",
    "      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. \n",
    "      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. \n",
    "    \"\"\"\n",
    "    m,n = X.shape           #(number of examples, number of features)\n",
    "    dj_dw = np.zeros((n,))\n",
    "    dj_db = 0.\n",
    "\n",
    "    for i in range(m):                             \n",
    "        err = (np.dot(X[i], w) + b) - y[i]                 \n",
    "        for j in range(n):                         \n",
    "            dj_dw[j] = dj_dw[j] + err * X[i, j]               \n",
    "        dj_db = dj_db + err                        \n",
    "    dj_dw = dj_dw / m                                \n",
    "    dj_db = dj_db / m   \n",
    "    \n",
    "    for j in range(n):\n",
    "        dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]\n",
    "\n",
    "    return dj_db, dj_dw"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the cell below to see it in action."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(1)\n",
    "X_tmp = np.random.rand(5,3)\n",
    "y_tmp = np.array([0,1,0,1,0])\n",
    "w_tmp = np.random.rand(X_tmp.shape[1])\n",
    "b_tmp = 0.5\n",
    "lambda_tmp = 0.7\n",
    "dj_db_tmp, dj_dw_tmp =  compute_gradient_linear_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)\n",
    "\n",
    "print(f\"dj_db: {dj_db_tmp}\", )\n",
    "print(f\"Regularized dj_dw:\\n {dj_dw_tmp.tolist()}\", )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Expected Output**\n",
    "```\n",
    "dj_db: 0.6648774569425726\n",
    "Regularized dj_dw:\n",
    " [0.29653214748822276, 0.4911679625918033, 0.21645877535865857]\n",
    " ```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Gradient function for regularized logistic regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_gradient_logistic_reg(X, y, w, b, lambda_): \n",
    "    \"\"\"\n",
    "    Computes the gradient for linear regression \n",
    " \n",
    "    Args:\n",
    "      X (ndarray (m,n): Data, m examples with n features\n",
    "      y (ndarray (m,)): target values\n",
    "      w (ndarray (n,)): model parameters  \n",
    "      b (scalar)      : model parameter\n",
    "      lambda_ (scalar): Controls amount of regularization\n",
    "    Returns\n",
    "      dj_dw (ndarray Shape (n,)): The gradient of the cost w.r.t. the parameters w. \n",
    "      dj_db (scalar)            : The gradient of the cost w.r.t. the parameter b. \n",
    "    \"\"\"\n",
    "    m,n = X.shape\n",
    "    dj_dw = np.zeros((n,))                            #(n,)\n",
    "    dj_db = 0.0                                       #scalar\n",
    "\n",
    "    for i in range(m):\n",
    "        f_wb_i = sigmoid(np.dot(X[i],w) + b)          #(n,)(n,)=scalar\n",
    "        err_i  = f_wb_i  - y[i]                       #scalar\n",
    "        for j in range(n):\n",
    "            dj_dw[j] = dj_dw[j] + err_i * X[i,j]      #scalar\n",
    "        dj_db = dj_db + err_i\n",
    "    dj_dw = dj_dw/m                                   #(n,)\n",
    "    dj_db = dj_db/m                                   #scalar\n",
    "\n",
    "    for j in range(n):\n",
    "        dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]\n",
    "\n",
    "    return dj_db, dj_dw  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the cell below to see it in action."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(1)\n",
    "X_tmp = np.random.rand(5,3)\n",
    "y_tmp = np.array([0,1,0,1,0])\n",
    "w_tmp = np.random.rand(X_tmp.shape[1])\n",
    "b_tmp = 0.5\n",
    "lambda_tmp = 0.7\n",
    "dj_db_tmp, dj_dw_tmp =  compute_gradient_logistic_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)\n",
    "\n",
    "print(f\"dj_db: {dj_db_tmp}\", )\n",
    "print(f\"Regularized dj_dw:\\n {dj_dw_tmp.tolist()}\", )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Expected Output**\n",
    "```\n",
    "dj_db: 0.341798994972791\n",
    "Regularized dj_dw:\n",
    " [0.17380012933994293, 0.32007507881566943, 0.10776313396851499]\n",
    " ```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Rerun over-fitting example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.close(\"all\")\n",
    "display(output)\n",
    "ofit = overfit_example(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the plot above, try out regularization on the previous example. In particular:\n",
    "- Categorical (logistic regression)\n",
    "    - set degree to 6, lambda to 0 (no regularization), fit the data\n",
    "    - now set lambda to 1 (increase regularization), fit the data, notice the difference.\n",
    "- Regression (linear regression)\n",
    "    - try the same procedure."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Congratulations!\n",
    "You have:\n",
    "- examples of cost and gradient routines with regression added for both linear and logistic regression\n",
    "- developed some intuition on how regularization can reduce over-fitting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
